UNCLASSIFIED 


AD  NUMBER 


AD907764 


NEW  LIMITATION  CHANGE 
TO 

Approved  for  public  release,  distribution 
unlimited 


FROM 

Distribution  authorized  to  U.S.  Gov't, 
agencies  only;  Test  and  Evaluation;  29  JAN 
1973.  Other  requests  shall  be  referred  to 
Office  of  Naval  Research,  Arlington,  VA. 


AUTHORITY 


ONR  ltr,  29  Aug  1973 


THIS  PAGE  IS  UNCLASSIFIED 


AD  907764 


SELECTED  STATISTICAL  TECHNIQUES 
APPLICABLE  TO 

ASW  EXERCISE  DESIGN  AND  ANALYSIS 


Prepared  by: 

Barbara  W.  Pemeski 
Peter  H.  Vanderwaart 
Lyman  l.  McDonald 


ANALYSIS  &  TECHNOLOGY,  INC. 


28  FEBRUARY  1973 


PREPARED  FOR: 

NAVAL  ANALYSIS  PROGRAMS 
OFFICE  OF  NAVAL  RESEARCH 
ARLINGTON.  VIRGINIA  22217 

CONTRACT  N000M-72-C-0238 
TASK  NR  36A-065 


D  D  C 

r?f7Df?m  he 


<  MAR  8  1933 


ssHrgns 


"Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government." 


"Distribution  limited  to  U.  S,  Government  Agencies 
only;  Test  and  Evaluation:  29  January  1973. 
Other  requests  for  this  document  must  be  referred 
to  Office  of  Naval  Research  (Code  462).” 


ANALYSIS 


TECHNOLOGY 


ANALYSIS  &  TECHNOLOGY,  INC.  REPORT 
Number  P-66-1-72 


SELECTED  STATISTICAL  TECHNIQUES 
APPLICABLE  TO 

ASH  EXERCISE  DESIGN  AND  ANALYSIS 


"Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Guurnment." 

"Distribution  limited  to  U.  S.  Government  Af.encics 
only;  Test  and  Evaluation:  29  January  1973. 
Other  requests  for  this  document  must  be  referred 
to  Office  of  Naval  Research  (Code  462).” 


PREPARED  FOR: 

NAVAL  ANALYSIS  PROGRAMS 
OFFICE  OF  NAVAL  RESEARCH 
ARLINGTON/  VIRGINIA  22217 


Analysis  &  Technology,  Inc.  -  Post  Office  Box  220,  North  Stonington,  Connecticut  06359  —  (2031  535-2977 


Security  Clnssificnlion _ _ 

|  DOCUMENT  CONTROL  DATA  -  R  &  D  *”"* 

I  (Security  ctntslflcntlnn  of  tltlt,  body  of  abstract  and  Indexing  Annotation  mu»t  be  onlorod  when  tho  overall  repott  Is  clntalllnd) 

[  I.  ORICINA  TING  ACTIVITY  (Corporato  author)  '  j2fl.  REPORT-SECURITY  CLASSIFICATION 


Analysis  &  Technology,  Inc. 
P.0.  Box  220  (Technology  Park) 
North  Stoning ton,  Conn.  06359 


UNCLASSIFIED 


2b,  CROUP 


13.  REPORT  TITLE 


SELECTED  STATISTICAL  TECHNIQUES  APPLICABLE  TO 
ASW  EXERCISE  DESIGN  AND  ANALYSIS 


•4.  DESCRIPTIVE  NOTES  (7ypo  ol  report  and  Inclualvo  dntoe) 

Final  Report 

5.  Au Tmorisi  (Firot  name,  middle  Initial,  laat  name) 

Barbara  W.  Perneski 
Peter  II.  Vanderwaart 
Lyman  L.  McDonald 


16.  REPORT  DATE 


28  February  1973 

ea.  CONTRACT  OR  GRANT  NO. 

NO 0014-7 2-C-0238 

6.  PROJECT  NO. 

RF  018-96-01 
CNR  364-065 


Wo.  TOTAL  NO.  OF  PACES  l7i).  NO.  OF  REFS 


IBO.  ORIGINATOR'S  REPORT  NUMOER(S) 


P-66-1-72 


fib.  OTHER  REPORT  liolsl  (Any  other  numbott  that  may  bo  aa tinned 
Ihla  report) 


10.  DISTRIBUTION  STATEMENT 


"Distribution  limited  to  U.S.  Government  Agencies  only;  Test  and  Evalua-J 
:ion:  29  January  1973.  Other  requests  for  this  document  must  be  referred] 
:o  Office  of  Naval  Research  (Code  462)."  I 


II-  SUPPLEMENTARY  NOTES 


12.  SPONSORING  MILITARY  ACTIVITY 


Naval  Analysis  Programs  (Code  462) 
Office  of  Naval  Research 
Arlington,  VA  22217 


13.  ABSTRACT 


This  report  contains: 


A  method  for  calculating  symmetric  confidence  intervals  for  search 
rate  and  mean  time-to-detection  for  the  active  and  passive  area 
search  missions. 

A  method  for  calculating  approximate  confidence  intervals  on  cumu¬ 
lative  detection  probability  as  a  function  of  range  for  the  general 
case  containing  "turn-arounds"  (CPAs)  and  "late-starters. " 

The  application  (and  modification)  of  various  techniques  for  calcu¬ 
lating  approximate  confidence  intervals  on  Mission  Measures  of 
Effectiveness  that  are  in  the  form  of  products  of  proportions. 

A  discussion  of  the  potential  bias  due  to  "late-starters"  in  the 
development  of  cumulative  detection  probability  as  a  function  of 
range.  Recommendations  for  eliminating  (or  at  least  minimizing) 
this  bias  are  included  in  this  report.  (U) . 


FORM 

f  NOV  •• 


1473 


REPLACES  DD  PORM  147*.  1  J AN  %4,  WHICH  IS 
OBSOLETE  fOf%  ARMY  USK. 


UNCLASSIFIED 

Security  ClsssiftcaiUon 


Security  damnification 


KCY  WORD3 


Cumulative  Detection  Probability 
as  a  Function  of  Range 

Confidence  Intervals 

Search  Rate 

Mean  Time-to-Detection 
Area  Search 

Confidence  Intervals  on  MOEs 
Sample  Size  Estimation 


SUMMARY 


Several  statistical  problems  important  to  submarine  ASW 
exercise  design  and  analysis  were  investigated  under  the 
research  contract  for  updating  the  current  SUBMARINE  ANALYSIS 
NOTEBOOK.  The  application  {but  not  the  development  and 
theory)  of  the  research  will  be  contained  in  the  revised 
edition  of  the  Notebook. 


The  underlying  theory  of  certain  results  of  this  research 
needs  to  be  published  separately,  since  it  contains  new 
techniques  developed  under  this  contract  or  includes  the 
modification  and/or  application  of  methodologies  not  avail¬ 
able  in  standard  textbooks.  This  paper  contains  this  theory. 

The  results  discussed  herein  include: 


.  A  method  for  calculating  symmetric  confidence  inter¬ 
vals  for  search  rate  and  mean  time-to-detection  for 
the  active  and  passive  area  search  missions. 

.  A  method  for  calculating  approximate  confidence  inter¬ 
vals  on  cumulative  detection  probability  as  a  function 
of  range  for  the  general  case  containing  "turn-arounds" 
(CPAs)  and  "late-starters . " 

.  The  application  (and  modification)  of  various  techniques 
for  calculating  approximate  confidence  intervals  on 
Mission  Measures  of  Effectiveness  that  are  in  the  form 
of  products  of  proportions. 

.  A  discussion  of  the  potential  bias  due  to  "late-starters" 
in  the  development  of  cumulative  detection  probability 
as  a  function  of  range.  Recommendations  for  eliminating 
(or  at  least  minimizing)  this  bias  are  included  in  this 
report. 
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I .  INTRODUCTION 


Analysis  &  Technology,  Inc.  has  been  conducting  a 
research  study  on  statistical  methods  for  the  design  and 
testing  of  submarine  exercise  data.  This  research  has 
been  sponsored  by  NAVAL  ANALYSIS  PROGRAMS  (Code  462)  of  the 
OFFICE  OF  NAVAL  RESEARCH  under  Contract  Number  N00014-72-C-0238 . 

The  purpose  of  r.his  study  is  to  provide  a  basis  for 
improvement  in  the  design,  analysis,  and  evaluation  of 
submarine  exercises  and  the  exercise  results  through  the 
development  and  application  of  statistical  techniques. 

The  results  of  this  study  are  to  be  included  in  a  revised 
edition  of  the  SUBMARINE  ANALYSIS  NOTEBOOK  (reference  (1)). 

The  notebook  will  contain  instructions,  procedures, 
standardized  tests,  and  analytical  techniques  to  evaluate 
in  advance  the  data  plans  and  requirements  for  proposed 
submarine  exercises,  and  a  description  of  the  post-exercise 
statistical  testing  and  general  analysis  necessary  for  the 
evaluation  of  the  recorded  data. 

Although  the  SUBMARINE  ANALYSIS  NOTEBOOK  will  be  the 
final  product  of  this  research  and  will  utilize  all  of  the 
results  of  this  study,  it  is  intended  to  be  a  convenient 
user's  guide  and  not  a  compendium  of  statistics.  Its 
purpose  is  to  provide  sufficient  theory  to  allow  the  user 
to  employ  the  statistical  techniques  correctly  and  to 
understand  the  implications  of  the  tests.  Therefore, 
certain  theory  and  rationale  behind  the  various  statistical 
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tests  will  not  be  included  in  the  Notebook.  Instead, 
the  user  is  referred  to  the  appropriate  textbook (s)  and 
published  papers  for  a  detailed  explanation  of  the 
theory . 

Certain  results  of  the  research  conducted  under  this 
contract  merit  separate  publication  since  they  require 
new  or  non-standard  techniques.  These  results  are  pre¬ 
sented  in  this  methodology  paper.  The  treatment  of  the 
techniques  discussed  herein  is  analytical  rather  than 
computational . 

Chapter  II  of  this  report  presents  a  method  for 
calculating  confidence  intervals  for  search  rate  and 
mean  time-to-detection  for  the  active  and  passive  area 
search  scenarios.  This  method  has  application  in  both 
the  design  phase  of  an  exercise  and  in  the  post-excrcise 
analysis  phase.  In  the  planning  stages,  the  test  designer 
can  use  it  to  estimate  sample  size  requirements  for  a 
desired  confidence  interval  around  the  sample  measure, 
or  conversely,  he  can  estimate  the  confidence  interval 
he  can  expect  to  obtain  from  a  predetermined  sample 
size.  In  the  post-exercise  analysis  phase,  the  analyst 
can  apply  the  technique  to  calculate  exact  confidence 
intervals  around  the  sample  measure  j  of  search  rate  and 
mean  time-to-detection. 

In  Chapter  III,  a  methodology  is  given  for  obtaining 
approximate  confidence  intervals  on  cumulative  detection 
probability  (CDP)  as  a  function  of  range.  Consider¬ 
able  research  was  devoted  to  this  problem  before  a 
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simple/  practical  method  could  be  developed.  The  com¬ 
plexity  of  the  problem  is  due  to  the  inclusion'  of  "turn¬ 
arounds"  or  CPAs  and  "late  starters"  in  the  construction 
of  the  estimate  of  the  CDP  curve. 

Chapter  IV  discusses  various  methods  of  establishing 
approximate  confidence  intervals  on  products  of  proportions 
with  respect  to  their  usefulness  in  ASW  applications. 

The  method  due  to  Madansky  (reference  (2))  is  recommended 
for  inclusion  in  the  SUBMARINE  ANALYSIS  NOTEBOOK  since 
it  appears  to  be  most  applicable  to  the  type  of  exercise 
data  used  for  obtaining  estimates  of  Measures  of  Effective¬ 
ness  (MOEs)  for  submarine  missions. 

The  research  conducted  on  cumulative  detection  prob¬ 
ability  (CDP)  as  a  function  of  range  revealed  that 
the  standard  estimate  of  this  function  is  susceptible  to 
bias  under  certain  conditions.  The  results  of  an  in¬ 
vestigation  of  this  potential  bias  are  presented  in 
Chapter  V. 


11 •  SYMMETRIC  CONFIDENCE  INTERVALS  FOR  SEARCH  RATE  AND  MEAN 
TIME-TO-DETECTION  FOR  THE  ACTIVE  AND  PASSIVE  AREA 
SEARCH  SCENARIOS 


Discussion 


A  primary  measure  of  interest  for  the  area  search 
mission  is  search  rate  as  calculated  from  exercise  data, 
^earch  rate  is  defined  as  the  rate  at  which  the  ASW 
unit  searches  its  area,  expressed  in  area  per  unit  time 
(for  example,  square  nautical  miles  per  hour) .  A 
related  measure  is  mean  time-to-detection,  which  is 
inversely  proportional  to  search  rate.  These  measures 
are  applicable  to  both  active  and  passive  area  search 
scenarios  and  are  discussed  in  detail  in  the  current 
SUBMARINE  ANALYSIS  NOTEBOOK  (reference  (1)). 

In  planning  an  area  search  exercise,  the  test  designer 
needs  to  investigate  sample  size  requirements  for  obtain¬ 
ing  statistically  valid  estimates  of  search  rate  and 
mean  time-to-detection.  This  is  true  whether  or  not  he 
has  control  over  the  sample  si<.*e.  If  he  does  have  control, 
then  he  can  decide  how  many  runs  to  schedule  in  order  to 
obtain  a  desired  confidence  interval  around  the  exercise 
estimates  of  these  measures.  If  he  does  not  have  control 
over  the  sample  size  (i.e.,  the  sample  size  has  been 
fixed  prior  to  the  design  of  the  exercise) ,  then  he  can 
estimate,  before  the  exercise,  the  anticipated  confidence 
interval  around  the  sample  estimates  of  search  rate  and 
mean  time-to-detection.  In  either  case,  the  designer  has 
a  tool  for  assessing  the  statistical  validity  of  the 
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After  the  exercise,  when  the  data  are  used  to  calculate 
estimates  of  search  rate  and  mean  time-to-detection,  the 
exercise  analyst  can  calculate  confidence  intervals  around 
his  estimates,  as  a  guide  to  their  probable  accuracy. 

A  method  for  determining  exact  symmetric  confidence 
intervals  on  these  measures  has  been  derived  from  Koopman's 
(reference  (3))  formulation  of  the  area  search  problem, 
and  is  presented  in  this  chapter. 

The  technique  provides  a  functional  relationship 
between  confidence  level,  the  width  of  the  confidence 
interval  as  a  percent  of  the  sample  estimate  of  the 
measures  (search  rate  or  mean  time-to-detection) ,  and 
the  number  of  detections.  It  is  the  tool  the  analyst 
needs  to  make  the  pre-exercise  decisions  and  to  do  the 
post-exercise  analysis. 

Since  sample  size  is  expressed  in  terms  of  detections 
and  not  in  terms  of  exercise  runs,  the  technique  may 
require  some  prediction  in  the  pre-exercise  phase.  Unless 
the  test  designer  is  able  to  specify  that  a  test  continue 
until  a  required  number  of  detections  has  occurred,  he 
needs  a  predicted  value  of  search  rate  or  mean  time-to- 
detection  in  order  to  convert  the  required  number  of 
detections  to  the  required  number  of  runs. 


Development  of  Confidence  Intervals  for  Search  Rate 

The  cumulative  detection  probability  as  a  function  of 
time  for  a  submarine  conducting  an  area  search  is  presented 
in  the  SUBMARINE  ANALYSIS  NOTEBOOK  (reference  (1))  as: 

CDP  =  1  -  exp  (-At) 


where : 


A  =  r  = 

t 


D 


N 

T, 

i=l 


fci 


t 


=  sample  mean  time-to-detection 


D  =  number  of  targets  detected  by  the  searcher, 

t^  =  the  length  of  the  ith  time  interval  of  target 
exposure,  ordered  so  that  t^  >_  t^^, 

N  =  total  number  of  time  intervals,  t^. 


Further,  the  quantity  search  rate  (SR)  is  defined  as: 


SR  = 


_  DA 


N 

l  t. 


=  AA  =  —  (area  per  unit  time) 
t 


i=l 


where : 


(II. 1) 


(II. 2) 


A  =  size  of  the  search  area  in  square  nautical 
miles. 


The  quantity  t  needs  to  be  rewritten  in  a  form  which 
is  consistent  with  the  usual  definition  of  a  mean;  i.e., 


X 


1 

M 


M 

.E  (Xi). 

1=1 


Referring  to  Dr.  B.  0.  Koopman's  Search  and  Screening 
(reference  (3)),  the  probability  Xdt  of  detecting  in  a 
short  time  interval  of  length  dt  is  independent  of  time; 
in  fact,  the  probability  of  a  detection  occurring  in  any 
short  time  interval  of  length  dt  is  constant.  Thus,  it 
is  possible  to  add  a  length  of  time  in  which  no  detections 
occurred  to  ar.y  other  time  interval,  and  especially  to  a 
time  interval  which  ended  in  a  detection. 


For  example,  let  us  consider  an  area  search  exercise 
which  produced  the  following  data.  The  D*  stand  for 
detections  and  the  ND  stand  for  no  detections, 


Time  (hours) 
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where:  t^  =  6  hours. 

fc4 

=  2  hours, 

t?  =  4 

hours 

t 2  =  2.5  hours, 

fc5 

=  10.5  hours, 

t8  =  6 

hours 

t^  =  8  hours. 

t6 

=  12  hours, 

D  =  5 

• 

Runs  number  1,  2,  5,  7,  and  8  ended  in  detection/  while  the 
remaining  three  runs  ended  before  the  target  was  detected. 
In  this  case,  the  mean  time-to-detection  (t)  can  be  calcu¬ 
lated  using  equation  (II. 1), 

-  1  N  1  8 

t  =  =r  l  t .  =  r  l  t.  =  10.2. 

D  i=l  1  5  i=l  1 

However,  since  the  probability  of  detection  in  a  small 
time  interval  is  constant,  the  data  may  be  regrouped  so 
that  every  time  interval  ends  in  a  detection.  A  possible 
rearrangement  of  the  data  in  this  example  j.s  presented 
below : 


Time  (hours) 

where:  t£  =  6  hours, 

t|  =  10,5  hours, 
t*  =  12.5  hours, 
t£  =  16  hours, 
t|  =  6  hours. 
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Assume  that  t|,  t£,...t*  form  a  random  sample  from  a 
population  whose  probability  density  function  is 

f(t)  =  X*  0  <  t  <  « 

with  the  cumulative  distribution  function 

F  (t)  =  l-e"***,  0  <  t  <  «. 

This  follows  from  the  premise  that  X*  is  assumed  to  be 
unknown ,  X*  >  0,  and  is  to  be  estimated  by  X  =  1/t. 

Define  the  random  variable 

X  =  2X*t* . 

Clearly,  t*  =  X/2X*,  dt*/dX  =  1/2X*  and  by  the  usual  change 
of  variable  technique  (reference  (4)),  the  probability 
density  function  of  X  is 

■  g  (X)  =  (l/2)e(_1/2)X,  0  <  X  <  co, 

the  chi-square  density  function  with  2  degrees  of  freedom. 

The  values  =  2X*t|  (i  =  1,...,D)  form  a  random  sample 

from  this  chi-square  density  function.  Thus,  by  the 

D 

reproductive  property  of  the  chi-square,  E  X.  =  EX.  = 

i=l  1  1 

2X*Et?  is  distributed  as  a  chi-square  random  variable 
with  2D  degrees  of  freedom.  Let  (2D)  and  X2^_g  (2D)  be 

respectively  [100  a]  and  [100  (1-0)]  percentage  points  of 
the  chi-square  with  2D  degrees  of  freedom.  Vie  have 


(II. 3) 


(II. 4) 


(II. 5) 


(II. 6) 
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P(X*(2D)  £  sxi  £x£_b(2D))  =  1-  (a+3)  =  Y-  (H-7) 

But  EXi  =  2X*Et?.  Hence 

P(X^(2D)  <  2X*Et?  <  Xf_3(2D))  =  l-(a+0)  =  Y  (II. 8) 

or  equivalently 

P(X*(2D)/2Et?  <  X*  <  Xi_3(2D)/2?:tp  =  l-(a+B)  =  Y*  (II-9) 

The  endpoints  of  the  interval  in  equation  (II. 9)  give  a 
confidence  interval  on  X  *  with  confidence  coefficient  y* 

Continuing,  we  can  use  the  well  known  results 

N  D 

SR*  =  AX*  and  SR  =  DA/Et .  =  DA/Et*  (II. 10) 

1  1  1  1 

where  SR*  =  true  (but  unknown)  search  rate  and  SR  denotes 
the  sample  estimate  of  the  search  rate.  Substituting  into 
(II«9)  we  have 


SRX^ (2D) 
,  2D 


SRXf_e(2D)\ 

<  SR*  <  - - J=  l-  (a+8)  = 


yielding  a  y  =  [(l-(oc+8))  100%)  confidence  interval  on 
SR*.  The  formulas 


•  SRX£(2D) 

La  =  ""2D -  and  Ul-B 


SRxJ_3(2D) 

2D 


(II. 11) 


(11.12) 
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yield  asymmetrical  confidence  intervals  (about  SR*)  if 
a  =  3.  However,  there  is  nothing  inherently  desirable 
about  asymmetrical  confidence  limits.  In  fact,  in  pre¬ 
exercise  design  (i.e.,  determination  of  sample  sizes  or 
number  of  detections)  symmetrical  intervals  are  more 
desirable.  Furthermore,  in  some  cases  the  width  of  the 
asymmetric  intervals  are  wider  than  the  corresponding 
symmetrical  ones.  To  obtain  the  symmetrical  confidence 
intervals,  find  a  and  (3  such  that 


SRX*(2D) 

— -  =  (1-A)  SR  =  SR-A (SR)  (11.13) 

and 

SRXi_o  (2D) 

- ^ -  =  (1  +  A)  SR  =  SR  +  A  (SR)  ,  (11.14) 

where  A  is  the  percent  accuracy,  0  <  A  <  1. 

That  is 

X“(2D)  =  2D  (i-A)  (11.15) 

and 

x|_p(2D)  =  2D  (1+A) .  (11.16) 
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Figure  II. 1,  appearing  at  the  end  of  this  chapter , 
presents  curves  giving  the  relationship  between  the 
number  of  detections  (D)  and  the  percent  accuracy  of 
the  sample  search  rate  (A) ,  for  four  confidence  levels  (y) ; 
namely  y  =  80%,  90%,  95%  and  99%.  The  application  of 
these  curves  is  illustrated  in  the  examples  presented 
in  the  last  section  of  this  chapter. 

Development  of  Confidence  Intervals  for  Mean  Time-to-Detection 

The  above  techniques  can  be  adjusted  to  yield  confidence 
intervals  on  1/A*,  the  "true"  but  unknown  mean  time-to- 
detection.  From  (II. 9)  we  have 

<  1  <  __2Dt__\=  1-  (a+B)  (11.17) 

vl~3  (2D)  X*  X*(2D)y 

N  D 

where  t  =  I  t./D  =  Z  t*/D.  Thus 
i=l  1  i=l  1 

L1-(J  =  2Dt/X^_3(2D)  (11.18) 

and 

Ua  =  2Dt/X„(2D)  (11.19) 

yield  asymmetrical  confidence  intervals  on  1/A*.  To 
obtain  symmetrical  confidence  intervals  on  1/A*,  set  L  = 

(l-A)t  and  U  =  (1+A)t.  Solving  these  equations,  we  have 
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Xf_3(2D).  =  2D/  (1-A)  , 
X^(2D)  =  2D/(1+A). 


(11.20) 


(11.21) 


Again,  by  fixing  D,  one  can  adjust  A,  a  and  3/  subject  to 
y  =  [l-(a+8)]  =  99%  (or  95%,  90%,  80%),  until  equations 
(11.20)  and  (11.21)  are  satisfied. 

Figure  II. 2  contains  graphs  giving  the  .relationship 
between  the  number  of  detections  (D) ,  and  the  percent 
accuracy  of  the  sample  mean  time-to-detection  (A) ,  for 
four  values  of  y  =  [l-(a+8)]/  namely  y  =  80%,  90%,  95% 
and  99%. 

Application 

The  following  examples  illustrate  the  use  of  Figures 
II. 1  and  II. 2  in  pre  and  post-exercise  analysis. 

Example  1  -  (Pre-Exercise) 

An  active  area  search  exercise  is  being  designed 
to  measure  active  search  rate.  The  test  designer 
wishes  to  estimate  the  number  of  detections  needed 
to  be  90%  confident  that  a  +20%  interval  around  the 
sample  search  rate  contains  the  true,  but  unknown, 
value  of  search  rate. 

In  this  case,  the  designer  will  not  attempt 
to  predict  the  value  of  the  sample  search  rate  since 
the  effects  of  target  forestalling  (i.e.,  target 
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counterdetecting  first  and  then  avoiding  detection) 
may  be  significant.  However,  his  estimate  of  the 
required  sample  size  must  be  in  terms  of  detections 
and  not  in  terms  of  exercise  runs. 

For  this  problem,  the  following  values  of 
confidence  level  and  percent  accuracy  are  used: 

Confidence  Level  =  90% 

Percent  Accuracy  =20% 

Confidence  Limit  =  SR  +.2SR 


Using  Figure  II. 1,  the  required  number  of 
detections  is  65. 

Example  2  -  (Pre-Exercise) 

In  designing  a  passive  area  search  exercise, 
the  test  designer  learns  that  the  proposed  exercise 
will  consist  of  60  runs,  each  24  hours  long.  He 
wishes  to  estimate  the  confidence  intervals  around 
the  sample  mean  time-to-detection,  prior  to  the 
conduct  of  the  exercise. 

From  prediction  and/or  prior  passive  area 
search  exercises*  the  test  designer  estimates  that 
the  cumulative  detection  probability  at  the  end 
of  each  run  is  .87.  Thus,  the  estimated  number  of 
detections  at  the  end  of  60  runs  is  approximately 
52  (i.e. ,  . 87  x  60)  . 

The  predicted,  pre-exercise  value  of  mean  time- 
to-detection  is  approximately  28  hours  (t  =  total 
search  time/D  ~  27.69). 


Using  Figure  11,2,  he  obtains  the  following 
estimates  of  three  confidence  intervals  around 
the  sample  mean  time-to-detection  based  on  D  =  52 
and  t  =  27.69: 


Confidence  Level 
80% 

90% 

95% 


Percent  Accuracy,  A 
.18 
.23 
.28 


Confidence  Interval 
(t-At ,  t+At) _ _ 

(22.71,  32.67) 

(21.40,  33.98) 

(19.94,  35.44) 


Example  3  -  (Post-Exercise) 

The  analyst  wishes  to  compute  80%,  90%,  95%  and 
99%  confidence  intervals  around  the  exercise  (sample) 
value  of  search  rate.  The  sample  search  rate  was 
calculated  to  be  23  square  nautical  miles  per  day 
based  on  45  detections. 

Using  Figure  II. 1,  he  obtains  the  following  values 
of  the.  exact  symmetric  confidence  intervals  around 
the  exercise  search  rate. 


Confidence  Interval 

Confidence  Level  Percent  Accuracy,  A  (SR-ASR,  SR+ASR) 


80% 

.19 

(18.63, 

27.37) 

90% 

.24 

(17.48, 

28.52 

95% 

.29 

(16.33, 

29.67) 

99% 

.38 

(14.26, 

31.74) 
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NUMBER  OF  DETEC 


III.  APPROXIMATE  CONFIDENCE  INTERVALS  FOR  CUMULATIVE 
DETECTION  PROBABILITY  CURVES 

Discussion 

Cumulative  detection  probability  as  a  function  of 
range  (CDP  =  f(R))  is  an  important  performance  measure 
of  a  sonar  system.  As  developed  from  exercise  data,  it 
provides  ^:he  analyst  with  an  estimate  of  the  system's 
detection  performance  in  terms  of  the  probability  of 
detecting  a  target  by  the  time  the  target  has  closed  to 
within  a  specified  range. 

The  development  of  CDP  =  f (R)  is  simple  if  each 
target  closed  until  it  was  detected  or  until  it  reached 
(approximately)  zero  range  from  the  detecting  unit.  In 
this  case,  when  N  trials  have  been  made,  the  cumulative 
detection  probability  as  a  function  of  range  can  be 
determined  as : 

^R 

CDP (R)  =  ~ 

where : 

Dr  =  Number  of  detections  made  at  range  R 
or  greater. 

However,  in  actual  exercises,  the  target  does  not 
continue  to  close  indefinitely  until  it  is  detected. 
Even  if  it  were  required  to  do  so,  the  resultant  CDP 


curve  would  not  be  representative  of  real  targets  that 
are  free  to  maneuver.  Thus,  a  realistic  target  may  reach 
a  closest  point  of  approach  (CPA)  after  which  it  begins 
opening  range.  In  many  cases,  such  a  target  may  never 
be  detected  during  the  run;  and  further,  the  observed 
CPA  of  a  run  may  be  at  a  lesser  range  than  several  of 
the  detection  ranges  of  other  runs.  These  undetected 
targets  are  called  CPAs  or  turn-arounds. 

In  addition,  a  typical  exercise  may  include  targets 
that  were  "late-starters" .  These  late-starters  became 
detection  opportunities  at  a  lesser  range  than  some  or 
many  of  the  detection  ranges  of  other  targets. 

A  detailed  discussion  of  the  development  of  cumulative 
detection  probability  as  a  function  of  range  for  the 
general  case  involving  CPAs  and  late-starters  is  presented 
in  the  current  SUBMARINE  ANALYSIS  NOTEBOOK  (reference  (1)). 
While  several  forms  of  the  CDP  equation  are  presented  in 
the  Notebook,  the  following  equation  is  the  most  useful 
for  our  development  of  confidence  intervals; 

i 

CDP.  =  1  -  II  g. 

j=l  J 

where ; 

GDP^  =  The  cumulative  detection  probability 
at  range  i. 

g.  =  The  probability  of  no-detection  in  the 

•  th  • 

j  range  band,  on  a  target  which  was 

•  •  th 

not  detected  before  entering  the  j 
range  band. 
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Methodology 


Several  techniques  (e.g.,  references  (5)  and  (6))  have 
been  proposed  in  attempts  to  solve,  ac  least  approximately, 
the  problem  of  obtaining  confidence  intervals  on  cumulative 
detection  probability  (CDP)  as  a  function  of  range.  In  the 
simple  case  with  no  CPAs  or  late-starters ,  confidence  limits 
can  be  calculated  using  standard  techniques  since  the  function 

DR 

CDP (R)  =  ~ 

is,  at  each  range,  an  observation  on  a  binomial  population. 
Specifically,  there  is  a  probability  (equal  to  CDP  (R) )  that 
a  closing  target  will  be  detected  at  a  range  greater  than 
or  equal  to  R.  The  observed  fraction  represents  DR  successes 
in  N  trials. 

In  contrast,  consider  the  more  complicated  case  where 
CPAs  and  late-starters  are  included  in  the  data.  Suppose 
a  detection  occurs  at  some  range  R*  that  is  less  than 
the  longest  detection  range,  and  less  than  at  least  one 
CPA  range  or  late-starter  starting  range.  It  is  not 
possible  to  characterize  CDP(R*)  as  D  successes  in  N 
trials  since  not  all  the  runs  that  were  valid  trials  at 
the  starting  range  are  still  valid  trials  at  R*  (due  to 
CPA's),  and  not  all  the  runs  that  are  valid  trials  at 
R*  have  been  valid  trials  over  the  entire  interval  from 
the  starting  range  to  R*  (due  to  late-starters) .  In 
short,  although  it  is  still  possible  to  count  the 
successes,  it  is  no  longer  possible  to  count  the  number 
of  trials;  and  hence,  the  binomial  confidence  interval 
technique  is  not  directly  applicable. 
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The  method  described  here  is  designed  to  choose  a 
suitable  number  to  use  as  this  unknown  number  o'  trials 
so  that  approximate  confidence  limits  can  be  calculated 
using  the  binomial  distribution.  The  choice  is  made  in 
such  a  way  that  the  resulting  confidence  intervals  re¬ 
flect,  in  a  reasonable  way,  the  actual  sample  size. 


Consider  the  following  notation: 


1. 


R^  =  range  of  an  i 

Ri+i  i  Ei  i  Ri-r 


■  th 


detection,  ordered  so  that 


2.  M.  =  number  of  targets  available  at  a  range  just 
less  than  range  R^. 


As  referred  to  in  the  Discussion,  the  equation  for  CDP 
given  in  the  SUBMARINE  ANALYSIS  NOTEBOOK  (reference  (1) ) 
is : 


CDP.  =  1  -  II 

a.  .  ,  g . 

3=1  y3 


Since  the  estimator  for  each  g •  can  be  written  in  the 

M.  3 

form  g^  =  ,  the  CDP^  can  be  rewritten  in  the  notation 

of  this  report. 

i  M. 

3.  CDP^  =  1-11  =  cumulative  detection  probability 

at  range  R^. 
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4.  =  Qpp-  =  estimated  sample  size  necessary  for  produc- 

1  i 

ir.g  i  detections  when  the  cumulative  detection  prob¬ 
ability  is  CDP^. 

5.  R|  =  an  arbitrary  range  satisfying  R^+^  <  R|  <  R^. 

6.  A*  =  number  of  " late-f> tarter s"  with  starting  range 
less  than  R^  but  greater  than  or  equal  to  R|. 

7.  C*  =  number  of  "turn  around s"  where  the  closest  point 
of  approach  (CPA)  is  less  than  R^  but  greater  than  or 
equal  to  R?. 

8.  =  ci  “  Ai  =  net  l°ss  in  number  of  targets 
(opportunities)  between  range  R^  and  R?. 

9.  [Kb  ]  =  largest  integer  less  than  or  equal  to 
Ni  =  i/CDPi. 

Given  a  sample  size  Kb  and  a  value  of  CDP^  for  a 
range  R. ,  we  could  calculate  the  expected  number  of 
detections  at  or  before  R^  by  using  the  following  equation: 

i  =  CDP^  .  N. 

In  this  case,  since  i  and  CDP^  are  known,  we  can  use  the 
same  equation  to  estimate  the  "effective  sample  size" 

N  =  — L. 
i  CDPi  * 
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There  are  two  cases  which  arise  in  placing  confidence 
intervals  on  the  value  of  CDP^  at  range  R^.  In  case  1, 

N.  is  an  integer.  Here  CDP.  =  i/N.  can  be  treated  as  i 
detections,  at  ranges  greater  than  or  equal  to  R^,  out 
of  (estimated)  opportunities.  Hence,  using  the 
tables  and  charts  for  "exact"  confidence  intervals  on 
a  single  proportion  (see,  for  example,  reference  (1)) 
one  can  obtain  an  approximate  yS  confidence  interval  on 
CDP^,  where  and  denote,  respectively,  the  lower 
and  upper  limits  of  the  interval. 


In  case  2,  is  not  an  integer.  Here 

i/([N±]  +  1)  <  CDPi  <  i/ [N±] .  (III.l) 

Using  the  tables  and  charts  for  confidence  intervals  on 
a  single  proportion,  we  can  obtain  an  approximate,  but 
probably  conservative,  y%  confidence  interval  on  CDPi. 

For  the  lower  limit,  1^,  take  the  lower  limit  of  a  y% 
confidence  interval  on  the  proportion  i/ ( [N^]  i-  1),  i.e., 
treat  CDP^  as  l  detections  out  of  [N^ ]  +  1  opportunities. 

For  the  upper  limit,  Ui,  take  the  upper  limit  of  a  y% 
confidence  interval  on  the  proportion  i/[N^],  i.e.,  treat 
CDP.  as  i  detections  out  of  [N^ ]  opportunities. 

i  j- 


The  above  technique  provides  confidence  intervals  on 
CDP  at  those  ranges  where  detections  occurred.  Consider 
range  R?  where  R^+^  <  R?  <  and  recall  that  the 
estimated  CDP  for  range  R?  is  still  CDP^.  First,  consider 
the  case  where  A?  >_  C?  (i.e.,  the  number  of  late-starters 
is  greater  than  or  equal  to  the  number  of  turn  arounds  in 


the  interval  from  R.  to  R?) .  For  this  case,  no  change  is 

1  1  a. 

recommended  in  the  confidence  interval  on  CDP.  As  R^  moves 

from  R.  to  R.  +1,  there  will  be  no  change  in  the  limits  of 

*  * 

the  confidence  interval  so  long  as  A^  >  C^. 

Secondly,  consider  the  case  where  A^  <  (the  turn¬ 
arounds  since  the  last  detection  outnumber  the  late- 
starters  since  the  last  detection).  Intuitively,  there  is 
a  drop  in  the  precision  of  our  estimate  of  CDP  at  range 
R?.  This  drop  in  the  precision  should  be  expressed  by 
calculating  a  wider  confidence  interval  on  CDP^  at  range 
r?.  Further,  it  should  be  done  in  such  a  way  that  the 
upper  limit  of  the  interval  goes  to  1004  if  the  number 
of  opportunities,  -  (C?  -At),  goes  to  zero.  Since 
CDP  is  a  monotonically  decreasing  function  of  range,  we 
can  still  use  as  the  lower  limit  on  CDP^  at  range  R^. 

As  an  intermediate  step  in  the  adjustment  of  the 

•ff 

upper  limit,  compute  an  estimate  of  CDP,  say  CDP. ,  as 

.  .  fch  ,  * 

if  the  i  detection  occurred  at  range  R^. 

We  obtain 


where  k*  =  C*  -  A*.  The  estimated  number  of  opportunities 
1  1 

producing  this  value  of  CDP  in  i  detections  is  =  i/CDP^. 
Of  course  our  estimate  of  CDP  at  range  R?  continues  to  be 


(III. 
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CDP^,  but  for  a  (conservative)  upper  limit  we  take  the 
upper  limit  of  a  confidence  interval  as  if  we  have  had 

i  detections  out  of  [N*]  opportunities.  Note  that  if 

*  ^  * 
k.  =  M. ,  then  there  are  no  targets  available  at  range  R. 

and  CDP?  =  1.  Hence,  the  upper  limit  on  CDP^  at  range 

R?  will  be  1.  This  technique  also  guarantees  that  the 

upper  limits  of  the  confidence  intervals  will  converge 

to  1  as  the  range  goes  to  0. 

Using  the  above  technique,  it  may  happen  that  the 
upper  limits  of  the  intervals  are  not  a  monotonically 
decreasing  function  of  range.  See  Figure  III.1  for  a 
sketch  of  an  artificial  example. 


R 


Figure  III.l  A  Situation  Where  the  Locus  of  the  Upper 
Limits  of  the  Confidence  Intervals  on 
CDP  is  not  a  Monotonic  Function  of  Range 


Non-monotonic  changes  in  the  upper  limits  are  possible 
if  large  numbers  of  late-starters  are  introduced  to  a 
relatively  small  number  of  opportunities.  One  may  make 
the  locus  of  the  upper  limits  monotonically  decreasing  as 
the  range  increases.  This  adjustment  is  indicated  by 
the  symbols  in  Figure  III.l  This  appears  to 

be  a  reasonable  adjustment  of  the  upper  limits  since  the 
"true"  CDP  is  a  monotonically  decreasing  function  of 
range . 

The  data  for  the  numerical  example  (fictitious)  appear¬ 
ing  in  Table  III.l  is  taken  from  the  current  SUBMARINE 
ANALYSIS  NOTEBOOK  (reference  (1)).  Approximate  80% 
confidence  limits  on  the  CDP  curve  have  been  calculated 
and  are  included  in  the  table.  Note  that  because  of 
the  small  number  of  opportunities  and  detections,  the 
intervals  are  quite  wide.  The  results  are  presented 
graphically  in  Figure  III. 2. 


Table  III.l  Computation  of  Cumulative  Probability  as  a  Function 
of  Range  with  (Approximate)  80%  Confidence  Limits. 


IV.  CONFIDENCE  INTERVALS  FOR  PRODUCTS  OF  PROPORTIONS 


Discussion 


It  is  well  known  that  several  Measures  of  Effectiveness 
(MOEs)  useful  in  the  analysis  of  Anti-Submarine  Warfare 
exercises  can  be  written  in  the  form 

N 

MOE  =  pxP2...pN  =  p^^  (IV. 

where  p^  is  the  probability  of  success  of  the  ith  component 
of  a  system,  given  that  the  first  (i— 1 )  components  have 
succeeded.  In  statistical  terms,  the  MOE  is  a  measure  of 
the  reliability  of  a  series  system  (i.e.,  a  system  in 
which  every  component  must  succeed  in  order  for  the  system 
to  succeed)  and  is  the  probability  that  the  system  will 
succeed  on  a  given  "trial". 

In  this  chapter,  the  problem  of  obtaining  (approximate) 
confidence  intervals  on  MOEs  is  considered.  A  brief 
description  of  techniques  due  to  Harris  (reference  (7)), 

Madansky  (reference  (2))  and  Walsh  (reference  (8))  is 
given,  with  a  discussion  of  the  conditions  under  which 
each  is  applicable.  In  addition,  an  in-depth  review  of 
a  modified  Bayesian  technique  is  presented,  along  with 
some  of  the  problems  associated  with  its  use.  Tables  are 
given  comparing  the  various  techniques  under  different 
conditions. 


Based  on  the  comparative  analysis,  the  method  due 
to  Madansky  is  recommended  for  inclusion  in  the  updated 
SUBMARINE  ANALYSIS  NOTEBOOK.  It  appears  to  be  most 
applicable  to  the  type  of  exercise  data  used  for  obtain¬ 
ing  estimates  of  Measures  of  Effectiveness  for  submarine 
missions. 

Theory 


Assume  X-^...,XN  are  statistically  independent  random 
variables  and  for  each  i,  X^  has  a  binomial  distribution 
with  parameters  n^  and  p^.  As  usual,  X^  will  be  the 
number  of  "successes"  out  of  n^  "trials"  where  the  true, 
but  unknown,  probability  of  success  on  any  trial  is  p^ 

Harris  gives  a  brief  review  of  prior  work  in  this 
area  and  extends  a  general  technique  developed  by  Buehler 
(reference  (9))  for  obtaining  approximate  confidence 
intervals  whenever  each  X^,  i=l,...,N,  is  approximately 
Poisson  distributed,  (i.e.,  roughly  n^  >  40  with  p^  <  5/n^) . 
Thus,  must  be  "small"  which  restricts  the  use  of  the 
technique  even  in  the  case  of  "large"  samples. 

A 

Walsh  succeeded  in  deriving  a  function  of  =  X^/n^, 

i  =  1,...N  which  can  be  "inverted"  to  obtain  an  approximate 

N 

confidence  interval  on  JI  p . .  The  function  is  approximately 

i=l  1 

normally  distributed  whenever  the  numbers  of  trials,  n^'s, 
are  moderately  large  and  the  success  probabilities,  p^'s, 
are  of  at  least  moderate  size  (i.e.,  roughly,  p^  1/2, 
n.p.  >_  10,  and  n.  (1-p. )  >  5;  or  p.  >  2/3,  n.p.  >10,  and 
n^(l-p^)  >_  2)  .  As  before,  the  restrictions  on  p^  limit 
the  use  of  the  procedure. 
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Madansky  derives  approximate  confidence  intervals  on 
N 

Ii  p-  by  "inverting"  the  generalized  likelihood  ratio 
i=l 

test  and  using  the  well  known  asymptotic  Chi-square  random 
variable.  The  technique  gives  good  results  whenever  the 
n. 's  are  "large"  and  the  p.'s  are  not  close  to  zero  (i.e., 
roughly,  n^  >  30  with  p^  >  5/n^). 

The  above  are  "relatively  large  sample"  techniques. 

Furthermore,  each  has  restrictions  on  p^,  i=l,...,N.  In 

the  case  when  these  restrictions  are  not  met  there  are 

no  procedures,  known  to  the  authors,,  for  obtaining 

N 

confidence  intervals  on  MOE  =  II  p.  . 

i-1  1 

In  an  attempt  to  provide  guidance  in  all  cases,  an 
investigation  was  made  into  the  Bayesian  confidence 
intervals  suggested  by  Springer  and  Thompson  (reference  (10) ) . 
Under  the  assumption  that  p^,  i=l,...,N  has  a  uniform  prior 
distribution  on  the  interval  from  zero  to  one,  they  derived 
the  Bayes  posterior  distribution  of  the  MOE,  conditional 

A 

on  the  observed  values  p^  =  X^/n^.  The  appropriate  per¬ 
centage  points  of  the  posterior  distribution  form  the  limits 
of  the  Bayesian  confidence  interval  on  the  MOE.  Unfortu¬ 
nately,  the  Bayesian  procedure  has  two  undesirable  features. 
First,  the  mean  of  the  posterior  distribution  is 


y 


N 

n 

i=l 


X^^  +  1 
n^  +  2 


(IV. 2) 


Equation  (IV. 2)  is  a  biased  estimate  of  the  MOE.  As  the 
authors  point  out,  the  estimate  is  unbiased  if  the  n^  all 
tend  to  infinity.  However,  for  moderate  sample  sizes,  n^. 
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the  bias  can  be  serious  if  N  is  large  and  is  close  to 
1.  See  Table  IV. 1  for  some  simple  numerical  examples. 


Table  IV.l 


Correct  Estimate 


N 


n  (X./n.) 
i=l  1  1 


Bayesian  Estimate  (Biased) 


N 

II  (X.+l)/(n.+2) 
i=l  1 


<§£> 

1=1 


=  .7738 


l  <§> 

i=l  - 


6209 


10  19 

1=1 


=  .5987 


10  20 

”  <§> 

1=1 


=  .3855 


15  19 

-11  # 
i=l 


=  .4633 


15  20 
i=l  ^ 


=  .2394 


A  second  problem  with  the  Bayesian  limits  is  that  for 
small  sample  sizes  (n^1 s)  they  are  too  narrow  in  at  least 
one  case.  Namely/  when  there  is  only  one  term  in  the 
product,  i.e./  the  MOE  is  a  single  proportion.  For  this 
case,  "exact"  confidence  intervals  can  be  given  as  in 


Table  IV. 2.  Comparison  of  the  Bayesian  limits  and  the 
"exact"  limits  on  a  single  proportion  lead  us  to  modify 
the  Bayesian  technique. 


Table  IV. 2  80%  Confidence  Interval  on  a  Single  Proportion 


When  the  Sample  Size  = 


Number  of 
Successes 


Proportion  of 
Successes 


Bayesian  Exact  Confidence 
Limits  Limits 


In  the  Bayesian  spirit,  assume  that  for  each  i,  p. 
a  random  variable.  Given  p^  =  X^/n^,  the  limits  (L^,  J 
of  an  exact  1-a  confidence  interval  on  p^  are  known  (se 
for  example,  reference  (11) )  to  satisfy 


BETA (L^;  Xi,ni-Xi+1)  =  a/2 


-A 


and 

BETA(Ui;Xi+l,ni-Xi)  =  l-a/2 


respectively.  The  function 


BETA (z ;  A/  B) 


F(A+B) 

-  T  (A)  1(B) 


-J f(t;  A, 


z 

t^^d-t)8"^ 


B)dt 


is  the  cumulative  form  of  the  beta  distribution.  Given 

A 

=  x-|/nj_  with  0<X^<n^,  define  the  “interval  generating 
function"  of  to  be 


Fi(Pi;Xi'ni>  = 


BETA(p. ;X. ,n. -X.+l)  if  0<BETA (p. ;X , ,n . -X. +1) < . 

X  X  X  X  x  x  — 

BETA (p . ; X . +1 , n . -X . )  if  l>BETA(p. ;X.+l,n.-X. ) > . 

XX  IX  IX  1  1  — 


.5 


if  PilP^p. 


where  pi  and  p±  satisfy  BETA (p± ;X^  f ni~Xi+l)  =  .5  and  BETA 


(Pi;Xi+l,ni“Xi)  =  .5  respectively. 


(IV. 4) 


(IV. 5) 


5 

5 

(IV.  6) 
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A  sketch  of  the  graph  of  F(p;3,5)  appears  in  Figure  IV. 1. 
When  p  =  3/5,  confidence  intervals  for  p  may  be  read  directly 
from  the  graph.  For  example,  with  F(p;3,5)  =  .05  and  .95 
we  obtain  p  =  .189  and  .924  respectively.  The  interval 
(.189,  .924)  is  a  90%  confidence  interval  on  p. 


If  X.  =0,  define 
i  ' 


F.(p.;0,n.)  =  BETA  (p. ; 1 ,n . ) ;  0<p.<l. 

XXX  XX  X 


If  =  n^,  define 


Fi(pi;ni'nO  =  BETA (pi ; n± ,  1 ) ;  O^p^l, 


Sketches  of  the  graphs  of  F(p;0,5)  and  F(p;5,5)  appear 
in  Figure  IV. 2.  In  the  case  Xi  =  0  the  lower  limit,  1^,  of 
the  (1-ct)  confidence  interval  on  p^  is  0  and  the  upper  limit 
satisfies  F.  (U^;0,ni)  =  1-a.  For  example,  from  Figure  IV. 2 
we  see  that  a  90%  confidence  interval  on  p  when  X  =  0, 
n  =  5  is  (C,  .369).  Similarly,  if  X^  =  n^,  then  the  upper 
limit,  U.  is  taken  to  be  1  and  the  lower  limit,  L- ,  satisfies 
Fi (Pi;ni,ni)  “  a*  From  Figure  IV. 2,  a  90%  confidence  inter¬ 
val  on  p  is  (.631,  1)  when  X  =  5,  n  =  5. 


(IV. 7) 


(IV. 8) 


Ey  this  time  it  is  obvious  that  we  are  requiring  the 

percentage  points  of  the  interval  generating  function, 

F .  (p . ;X. ,n . ) ,  to  be  the  limits  of  an  exact  confidence  inter- 
x  *ri  i  l 

val  on  p. 


F  (  p  ;  3,5) 


i 

I 

i 

! 

1 


Figure  IV. 1  Graph  of  the  interval  genera 
function  F(p,*3,5)  giving  con 
intervals  on  p  when  p=3/5. 


Also,  the  function  Fj^p^X^n^)  satisfies  necessary  and 
sufficient  conditions  (see,  for  example  reference  (12))  to 
oe  a  cumulative  probability  distribution  function.  This 
motivates  the  following  definition: 

Definition  4.1.  Conditional  on  the  observed  value 

/■> 

Pi  =  X^/n^,  the  modified  Bayesian 
"posterior"  cumulative  probability 
distribution  of  p^  is  given  by 
Fi(pi;Xi,ni)  for  i  =  1,2,...,N. 

We  desire  the  posterior  cumulative  probability  dis- 

N  N  * 

tribution  (cpd)  of  MOE  =  H  p.  conditional  on  II  p. . 

i=l  1  i=l  1 

The  posterior  cpd  is  not  available  in  closed  form.  How- 
ever,  it  can  be  simulated  easily  on  a  high  speed  computer. 
The  modified  Bayesian  confidence  intervals  will  then  be 
given  by  the  appropriate  percentage  points  in  the  simulated 
posterior  cpd  of  MOE. 

To  simulate  the  posterior  cpd  of  R,  first  generate 
uniformly  distributed  random  numbers,  r^,  between  0  and  .1 
and  solve  the  equations 


r^  —  F^ (p^;X^ ,n^) ;  i— 1 , . . . , N , 


for  p^.  Call  the  solutions  p?;  i=l,...,N.  This  gives 

random  observations  from  the  posterior  distributions  of 

N 

p.,  (i=l,...,N).  Form  the  product  R*  =  II  pt. 

1  i=l  1 


(IV. 9) 
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N 

Definition  4.2.  The  product  R*  =  n  p*  is  a  random 
*  i=l  1 

observation  from  the  posterior  cpd 

of  the  MOE. 

Continuing  in  the  above  manner,  generate  M  observations 
from  the  posterior  cpd  of  the  MOE,  and  denote  them  by 
R*,  R*,...,R*.  Let  PCj  denote  the  jth  percentile  of  the 
r|'s,  i.e.,  PCj  is  the  value  such  that  j%  of  the  R?'s  are 
less  than  or  equal  to  PCj .  Clearly,  if  M  is  large  enough, 
the  limits  of  the  80%  modified  Bayesian  confidence  interval 
on  R  will  be  given  approximately  by  PC-^q  and  PC^q.  The 
limits  of  the  90%  modified  Bayesian  confidence  interval  on 
R  will  be  given  approximately  by  PCg  and  PC^,./  etc. 

A  program  was  written  in  GE  MK  II  Time-Sharing  Fortran 
for  obtaining  the  modified  Bayesian  confidence  intervals 
and  is  available  upon  request. 

The  accuracy  of  the  program  yielding  the  modified 
Bayesian  intervals  depends  on  M,  the  number  of  simulations, 
and  the  accuracy  of  the  internal  computer  function  used  to 
solve  the  incomplete  beta  function.  The  accuracy  of  the 
internal  incomplete  beta  function  available  on  our  computer 
is  not  too  good.  In  extreme  cases,  agreement  with  tabled 
values  is  to  no  more  than  2  or  3  digits.  This,  coupled 
with  the  numerical  technique  used  to  fit  the  curve 
Fi^pi;Xi,ni^  ky  a  series  of  straight  lines,  did  not  yield 
the  ?  or  3  digit  accuracy  hoped  for.  however,  so  long  as 
M>^  40/a,  the  end  points  of  the  1-a  confidence  interval  on 
a  single  proportion  (N=l)  were  within  +.01  of  the  exact 
confidence  intervals  appearing  in  reference  (11) .  See, 
for  example,  the  figures  in  Table  IV. 3. 
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Confidence 

(1-a) 


Table  IV. 3 

Confidence  Intervals  on  p  when  X=1 ,  n=5 
Confidence  I  Modified  Bayesian  Intervals 


Exact  Confidence 
Limits 


(.02,. 53) 
(.01, .66) 
(.005, .72) 


M=100 

M=200 

M=400 

(.02, .57) 

(.010, .64) 

(.004, .74) 

(.02, .56) 

(.01, .66) 

(.003, .75) 

(.02, .57) 

(.01, .67) 

(.007, .73) 

M=1000 


For  the  case  N>2,  all  procedures  known  to  the  authors 
for  obtaining  ordinary  confidence  intervals  on  the  MOE 
depend  on  asymptotic  distribution  theory  and  hence  are 
only  approximate.  For  this  reason,  comparisons  with  the 
modified  Bayesian  intervals  are  meaningful  only  in  the 
case  of  "large"  n^'s.  With  the  exception  of  the  last 
column  giving  the  modified  Bayesian  limits,  the  figures 
in  Table  IV. 4  appear  in  Harris  (reference  (7)).  The 
modified  Bayesian  limits  are  seen  to  be  close  to  the 
approximate  ordinary  limits  and  in  fact  are  between 
Buehler's  limit  and  Harris'  limit  for  all  but  one  set  of 
the  values  of  and  n^  used. 
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If  the  n^'s  are  large  and  the  p^'s  are  not  close  to 
zero  then  Madansky's  likelihood  ratio  method  is  satis¬ 
factory.  Table  IV. 5  contains  one  such  example.  Again, 
there  is  good  agreement  between  the  Modified  Bayesian 
intervals  and  the  ordinary  intervals. 


Table  IV. 5 

N 

Approximate  Confidence  Limits  for  MCE  =  II  p. ,  Where 

i=l  1 

Pi  ~  Probability  of  Success  at  ith  Component 

4 

n  X./n.  =  .156 
i=l  1  1 

x^  =  34  =  87 

y>2  =  21  =  33 

x3  =  23  n3  =  45 

x^  =  21  =  22 


Confidence 

Level 

Madansky ' s 
Likelihood  Ratio 

Modified 
Bayesian 
(M=100 ) 

Modified 

Bayesian 

11=400) 

80% 

.116,  .203 

.097,  .209 

.101,  .20 

90% 

.106,  .218 

.086,  .218 

.083,  .22 

95% 

.098,  .231 

.043,  .260 

.044  ,  .23 

Unfortunately,  numerical  computations  indicate  that 

the  modified  Bayesian  procedure  is  also  severely  biased 

if  the  p^'s  are  close  to  one  and  the  number  of  terms  in 

the  product,  N,  is  large.  Thus,  it  is  recommended  that 
N 

the  product  II  (X.,+1)/ (n . +2) ,  be  "close"  to  the  product, 

1  a.  ± 

X — 

N 

II  (X./n. ),  before  the  modified  Bayesian  procedure  is 
i=l 


applied. 


V.  A  NOTE  ON  THE  EFFECT  OF  "LATE  STARTERS"  ON  THE 
ESTIMATE  OF  CUMULATIVE  DETECTION  PROBABILITY 


Discussion 


In  conducting  the  research  for  estimating  confidence 
intervals  on  cumulative  detection  probability  (CDP)  as  a 
function  of  range  (see  Chanter  III  of  this  paper) ,  it 
was  discovered  that  the  inclusion  of  "late  starters"  in 
the  data  base  may  produce  a  biased  estimate  of  the  desired 
CDP.  This  bias  is  due  to  the  fact  that,  in  some  cases, 
the  "late  starters"  have  their  own  CDP  curve  which  may  be 
significantly  different  from  the  CDP  curve  for  "non-late 
starters".  The  magnitude  of  the  bias  of  an  estimate  of 
CDP  calculated  from  exercise  data  may  be  unknown. 

Theory 

Let  f (R)  denote  the  cumulative  detection  probability 
(CDP)  at  range  R  of  targets  whose  starting  ranges  are 
beyond  the  (reasonable)  limits  of  detection.  It  is 
assumed  that  we  desire  to  estimate  the  function  f (R)  for 
all  R  >_  0.  Let  fs(R)  denoce  the  CDP  at  range  R  of  targets 
whose  starting  range  is  S,  where  S  is  less  than  the  limits 
of  detection.  For  ease  of  presentation  assume  there  are 
two  groups  of  targets:  first,  N  targets  beyond  the  limits 
of  detection  and  then  targets  start  at  range  S-^.  See 
Figure  V.l. 
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Figure  V.l 


Consider  the  range  "bin"  (R*,  S-^)  .  Using  the  standard 
technique  in  reference  (l),  the  value  of(R*)  can  be 

A 

estimated  by  f (R*) .  The  expected  value  of  the  estimate 
is 


E  (f(R*))  =  1  - 


'N(l-f(S1))^N(l-f(R*))  +  Nx  (1-f^  (R*)\ 


N 


N(l-f  (S1))  +  N1 


y 


The  right  hand  side  of  equation  V.l  is  not  equal  to  f(R*) 
for  all  possible  values  of  the  function  f~  (R*) ,  i.e.,  tl 
estimation  procedure  is  biased  at  R*  unless  the  function 
fg^  (R)  is  such  that  E (f (R* ) )  =  f(R*).  Setting  the  right 
hand  side  of  equation  V.l  equal  to  f (R*)  and  solving  for 
fSi  (R*) /  we  find  that 


CDP 


Equation  V.2  is  a  necessary  condition  m  order  for  the 

A 

standard  estimate,  f(R*),  to  be  unbiased.  If  fg.^  (R*) 

<  (f (R*)  -  f (S^) )/ (1-f  (S^) )  then  f(R*)  is  too  small  (on 
average)  and  if  fs!  (R*)  >  (f  (R*)  -  f  (S-^) )/ (1-f  (S-^) ) 
then  f (R*)  is  too  large  (on  the  average).  The  above 
remarks  point  out  the  fact  that,  if  "late  starters" 
have  their  own  CDP  curves  then  combining  data  from 
"late  starters"  with  "non-late  starters"  to  estimate 
the  latter's  CDP  curve  may  produce  biased  results.  The 
magnitude  of  the  bias  in  a  complicated  exercise  will 
be  unknown. 

To  illustrate  the  above,  two  artificial  but  intuitive 
numerical  examples  follow.  Assume  that  at  range  6  miles 
the  true  CDP  is  25%  and  at  range  5  miles  the  true  CDP  is 
30%.  Assume  that  for  late  starters  at  range  6  miles,  the 
probability  is  25%  that  a  target  will  be  immediately 
detected.  After  the  immediate  detections,  assume  the 
CDP  of  the  late  starters  follows  the  "true"  CDP.  See 
Figure  V.2. 


A  0  -4— 


.25 


_L_ 

I 


True  CDP  =  f { R) 

CDP  of  lale  staffers  =  fG ( R ) 

Expected  estimate  of 
true  CDP=  f'(R) 


5 

MILES 


Figure  V.2 


In  an  exercise,  ass.ume  there  are  20  targets  whose 
starting  ranges  are  beyond  the  limits  of  detection  and 
20  targets  which  start  at  6  miles.  At  a  range  ''slightly" 
more  than  6  miles,  say  6  ,  the  standard  formulas  will 
yield  an  estimate  of  CDP  whose  expected  value  is 

E(f(6+)  =  1  -  (15/20)  =  .25. 

In  this  example,  the  estimation  technique  is  unbiased  for 
ranges  greater  than  6  miles.  At  6  miles,  5  of  the  late 
starters  will  be  detected  immediately.  In  other  words, 
we  expect  15  +  15  =  30  nondetections  out  of  15  +  20  -  35 
opportunities.  The  updated  estimate  of  CDP  will  have  the 
expected  value 

E  (f  (6) )  =  1  -  (15/20)  (30/35) 

=  .357. 

Thus,  at  range  6  miles,  the  estimation  technique  is  biased 
by  10.7%.  The  "true"  CDP  is  25%,  but  on  the  average,  the 
estimate  of  the  true  CDP  is  35.7%.  In  the  range  "bin"  from 
6  to  5  miles,  we  expect  (.05)  (20)  =  1  detection  from  each 

group  of  Largets.  In  other  words,  we  expect  28  nondetections 
out  of  30  opportunities.  The  updated  estimate  of  CDP  will 
have  the  expected  value 

E (f  (5) )  =  1  -  (.643)  (28/30) 

=  1  -  .6 


.4. 


The  procedure  is  biased  by  10%  at  range  5  miles. 

For  the  second  example,  assume  that  the  probability  is 
1.0  that  a  target  will  be  detected  within  one  mile  of  starting 
range  if  its  starting  range  is  less  than  8  miles.  Further, 
assume  that  the  probability  of  detection  is  uniform  over  this 
unit  interval,  and  that  there  is  no  chance  of  detection  of 
targets  at  a  range  greater  than  8  miles.  CDP  curves  for  3 
starting  ranges  are  sketched  in  Figure  V.3. 


Figure  V.3 
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Consider  an  exercise  in  which  we  have  10  targets  with 
starting  range  greater  than  8  miles,  5  late  starters  at 
7 . 5  miles  and  5  late  starters  at  7  miles .  The  expected 
CDP  curve,  as  computed  by  the  formulas  in  (reference  (1)), 
appears  in  Figure  V.4.  For  ranges  less  than  7.5  miles/ 
the  estimates  are  too  small  (on  the  average) . 


6  6.5  7  7.5  8 


MILES 

Figure  V.4 

Recommendation 

An  obvious  solution  to  this  potential  bias  due  to  "late- 
starters"  is  to  eliminate  all  of  them  from  the  data  base  for 
calculating  cumulative  detection  probability.  However,  this 
approach  is  not  desirable  since  a  significantly  large  portion 
of  the  data  may  consist  of  "late-starters " ,  in  which  case  the 
sample  size  would  be  drastically  reduced.  Thus,  the  analyst 
must  use  his  discretion  in  eliminating  some  "late-starters" 
and  retaining  others. 
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The  following  guidelines  for  eliminating  "late-starters" 
are  recommended  at  this  time: 

1)  All  "late-starters"  that  are  detected  immediately  (or 
just  about  immediately)  after  becoming  a  detection 
opportunity  should  be  removed  from  the  data  base, 
since  it  is  probable  that  they  would  have  been  detected 
at  a  longer  range  had  they  been  opportunities  at  a 
longer  range. 

2)  All  "late-starters"  that  have  a  very  short  start  range 
should  be  eliminated.  A  start  range  is  considered  to 
be  very  short  if  it  is  less  than  a  large  proportion  of 
the  detection  ranges.  (e.g.,  start  range  less  than 
50%  of  the  detection  ranges.) 

Additional  research  relative  to  the  " late-s tarter "  effect  on 
the  distribution  theory  of  cumulative  detection  probability 
may  result  in  the  formulation  of  different  rules. 
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